Hi,
I've summarised my data using concat because I want to keep all unique values but it also repeats duplicates - is there any way to remove duplicates within a cell?
For example: change "A,A,A,B,B,C,D,E" to "A,B,C,D,E"?
Thanks
Solved! Go to Solution.
That was an interesting puzzle. Thanks for giving me some work to do while waiting on a plane.
regex_replace([field],"\b(\w+),(?=.*\b\1,?)","")
This will get you your result of A,B,C,D,E
How did I figure it out? Google:
Was also wondering if it would be easier for you to add a unique before your summarize?
Should be a little more efficient too if you have a lot of data.
If you didnt want to get into the Regex, another option would be to break the string out into individual rows using the @Text to Columns' tool from that you just add a Unique tool after it then Cross Tab back to return the rows to a string
Hi Joe,
They was only one character for each record before it was summarised but after it was summarised (concatenated), the characters were combined into one string regardless of whether they were unique or duplicate values.
Thanks,
Heidi
Hi Heidi,
I am not completely sure that I follow....
In the stream before you have the summarize tool to concatenate them together couldnt you add a unique. Check the field you are concatenating (and quite probably another filed you are grouping by).
This should then contain the list you require without the need for RegEx. Not that the result will be any different as Mark's function work, just would hopefully be a little more efficient.
Thanks
Joe
Hello,
I tried using this expression to remove duplicate values within the same cell after concatenating rows. It does not seem to be working though. How would I modify this to remove strings that are multiple words and not just single character strings?
regex_replace([field],"\b(\w+),(?=.*\b\1,?)","")
Hello,
I have the same issue after using the same formula. regex_replace([field],"\b(\w+),(?=.*\b\1,?)","")
But the good thing though was the comma was remove except for the duplicate values inside the cell didn't work.
From
8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD
Output I got is:
8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD
Thank you
Give this a try: regex_replace([Field1],'([^,]*)(\,\1)*(\,|$)','\1\3')
It worked for what I needed and when I ran it against your example I got: 8985 VENICE BLVD
Which I think is what you are looking for.
I am having trouble with this formula. It seems to remove the commas (my delimiter) in all but the first set. I need to use the regex replace formula because I have multiple fields to edit and want to use the multi-field formula tool.
I changed it from "/w+" to ".+" since I have some non-alpha-numeric characters in my data. I don't care about the order of the output, just that they are unique. I can change the delimiter from comma to something else if that works better.
this is what I'm currently using: regex_replace([Regions Submitted],"\b(.+),(?=.*\b\1,?)",""). Sample data is attached.
Here is how I would like the formula to work:
input: CEN,NE,CEN,NE,CEN,NE,RM,RM,RM,RM,RM,RM
output: CEN, NE, RM (order doesn't matter here, as long as the three unique values are present, separated by a comma or other delimiter)
input: ICE CREAM BLACK SESAME,ICE CREAM BLACK SESAME
output: ICE CREAM BLACK SESAME
input: BIOSIL® HAIR, SKIN, NAILS
output: BIOSIL® HAIR, SKIN, NAILS
input: HUMPHRY SLOCOMBE | HMSLCM | 129921,HUMPHRY SLOCOMBE | HMSLCM | 129921
output: HUMPHRY SLOCOMBE | HMSLCM | 129921
input: email.last@test.com,email.last@test.com,email2.last@test.com
output: email.last@test.com,email2.last@test.com